Reinforcement Learning (RL) can enable agents to learn complex tasks. However, it is difficult to interpret the knowledge and reuse it across tasks. Inductive biases can address such issues by explicitly providing generic yet useful decomposition that is otherwise difficult or expensive to learn implicitly. For example, object-centered approaches decompose a high dimensional observation into individual objects. Expanding on this, we utilize an inductive bias for explicit object-centered knowledge separation that provides further decomposition into semantic representations and dynamics knowledge. For this, we introduce a semantic module that predicts an objects' semantic state based on its context. The resulting affordance-like object state can then be used to enrich perceptual object representations. With a minimal setup and an environment that enables puzzle-like tasks, we demonstrate the feasibility and benefits of this approach. Specifically, we compare three different methods of integrating semantic representations into a model-based RL architecture. Our experiments show that the degree of explicitness in knowledge separation correlates with faster learning, better accuracy, better generalization, and better interpretability.
translated by 谷歌翻译
即使在几个例子中,人类能够学会识别新物品。相比之下,培训基于深度学习的对象探测器需要大量的注释数据。为避免需求获取和注释这些大量数据,但很少拍摄的对象检测旨在从目标域中的新类别的少数对象实例中学习。在本调查中,我们在几次拍摄对象检测中概述了本领域的状态。我们根据培训方案和建筑布局分类方法。对于每种类型的方法,我们描述了一般的实现以及提高新型类别性能的概念。在适当的情况下,我们在这些概念上给出短暂的外卖,以突出最好的想法。最终,我们介绍了常用的数据集及其评估协议,并分析了报告的基准结果。因此,我们强调了评估中的共同挑战,并确定了这种新兴对象检测领域中最有前景的电流趋势。
translated by 谷歌翻译
Participants in political discourse employ rhetorical strategies -- such as hedging, attributions, or denials -- to display varying degrees of belief commitments to claims proposed by themselves or others. Traditionally, political scientists have studied these epistemic phenomena through labor-intensive manual content analysis. We propose to help automate such work through epistemic stance prediction, drawn from research in computational semantics, to distinguish at the clausal level what is asserted, denied, or only ambivalently suggested by the author or other mentioned entities (belief holders). We first develop a simple RoBERTa-based model for multi-source stance predictions that outperforms more complex state-of-the-art modeling. Then we demonstrate its novel application to political science by conducting a large-scale analysis of the Mass Market Manifestos corpus of U.S. political opinion books, where we characterize trends in cited belief holders -- respected allies and opposed bogeymen -- across U.S. political ideologies.
translated by 谷歌翻译
Knowledge distillation (KD) has gained a lot of attention in the field of model compression for edge devices thanks to its effectiveness in compressing large powerful networks into smaller lower-capacity models. Online distillation, in which both the teacher and the student are learning collaboratively, has also gained much interest due to its ability to improve on the performance of the networks involved. The Kullback-Leibler (KL) divergence ensures the proper knowledge transfer between the teacher and student. However, most online KD techniques present some bottlenecks under the network capacity gap. By cooperatively and simultaneously training, the models the KL distance becomes incapable of properly minimizing the teacher's and student's distributions. Alongside accuracy, critical edge device applications are in need of well-calibrated compact networks. Confidence calibration provides a sensible way of getting trustworthy predictions. We propose BD-KD: Balancing of Divergences for online Knowledge Distillation. We show that adaptively balancing between the reverse and forward divergences shifts the focus of the training strategy to the compact student network without limiting the teacher network's learning process. We demonstrate that, by performing this balancing design at the level of the student distillation loss, we improve upon both performance accuracy and calibration of the compact student network. We conducted extensive experiments using a variety of network architectures and show improvements on multiple datasets including CIFAR-10, CIFAR-100, Tiny-ImageNet, and ImageNet. We illustrate the effectiveness of our approach through comprehensive comparisons and ablations with current state-of-the-art online and offline KD techniques.
translated by 谷歌翻译
Deep Reinforcement Learning (RL) agents are susceptible to adversarial noise in their observations that can mislead their policies and decrease their performance. However, an adversary may be interested not only in decreasing the reward, but also in modifying specific temporal logic properties of the policy. This paper presents a metric that measures the exact impact of adversarial attacks against such properties. We use this metric to craft optimal adversarial attacks. Furthermore, we introduce a model checking method that allows us to verify the robustness of RL policies against adversarial attacks. Our empirical analysis confirms (1) the quality of our metric to craft adversarial attacks against temporal logic properties, and (2) that we are able to concisely assess a system's robustness against attacks.
translated by 谷歌翻译
Linear-quadratic regulators (LQR) are a well known and widely used tool in control theory for both linear and nonlinear dynamics. For nonlinear problems, an LQR-based controller is usually only locally viable, thus, raising the problem of estimating the region of attraction (ROA). The need for good ROA estimations becomes especially pressing for underactuated systems, as a failure of controls might lead to unsafe and unrecoverable system states. Known approaches based on optimization or sampling, while working well, might be too slow in time critical applications and are hard to verify formally. In this work, we propose a novel approach to estimate the ROA based on the analytic solutions to linear ODEs for the torque limited simple pendulum. In simulation and physical experiments, we compared our approach to a Lyapunov-sampling baseline approach and found that our approach was faster to compute, while yielding ROA estimations of similar phase space area.
translated by 谷歌翻译
对于头颈癌(HNC)患者管理,自动总肿瘤量(GTV)细分和准确的治疗前癌症复发预测对于协助医师设计个性化管理计划非常重要,这有可能改善治疗结果和治疗结果和HNC患者的生活质量。在本文中,我们基于HNC患者的组合预处理正电子发射断层扫描/计算机发射断层扫描(PET/CT)扫描,开发了一种自动原发性肿瘤(GTVP)和淋巴结(GTVN)分割方法。我们从分段的肿瘤体积中提取了放射素学特征,并构建了多模式肿瘤复发生存率(RFS)预测模型,该模型融合了预测由单独的CT放射线学,PET放射线学和临床模型融合在一起。我们进行了5倍的交叉验证,以训练和评估MICCAI 2022头和颈部肿瘤分割和结果预测挑战(Hecktor)数据集的方法。 GTVP和GTVN分割的测试队列的集合预测分别达到0.77和0.73,RFS预测的C-指数值为0.67。该代码公开可用(https://github.com/wangkaiwan/hecktor-2022-airt)。我们团队的名字叫艾特。
translated by 谷歌翻译
在纠缠和连贯性等计量学中利用量子效应使人们可以测量具有增强灵敏度的参数。但是,时间依赖性的噪声会破坏这种海森堡限制的扩增。我们提出了一种基于量子信号处理框架,以克服这些现实的噪声诱导的实践量子计量学限制。我们的算法将门参数$ \ varphi $〜(单量Z阶段)分开,该算法易受时间依赖性错误与目标门参数$ \ theta $〜(| 10>和| 01> state之间的交换 - 角)易受时间依赖时间的错误。这在很大程度上没有时间依赖性误差。我们的方法实现了$ 10^{ - 4} $径向的准确性,用于学习超导级实验的$ \ theta $,以优于两个数量级的现有替代方案。我们还通过快速的傅立叶变换和顺序相位差异证明了学习时间依赖性栅极参数的鲁棒性。我们从理论和数字上均显示出最佳计量方差缩放的有趣过渡,这是电路深度$ d $的函数,从预抗态度制度$ d \ ll 1/\ theta $ to to Heisenberg限制$ d \ to \ to \ $ $。值得注意的是,在临时策略中,我们的方法对时间敏感参数$ \ varphi $比例的估计差异比渐近的海森伯格限制快速限制为深度的函数,$ \ text {var}(\ hat {\ varphi})\ aid 1/d^4 $。我们的工作是第一个证明在实验室量子计算机中实用应用的量子信号处理算法。
translated by 谷歌翻译
知识蒸馏(KD)是压缩边缘设备深层分类模型的有效工具。但是,KD的表现受教师和学生网络之间较大容量差距的影响。最近的方法已诉诸KD的多个教师助手(TA)设置,该设置依次降低了教师模型的大小,以相对弥合这些模型之间的尺寸差距。本文提出了一种称为“知识蒸馏”课程专家选择的新技术,以有效地增强在容量差距问题下对紧凑型学生的学习。该技术建立在以下假设的基础上:学生网络应逐渐使用分层的教学课程来逐步指导,因为它可以从较低(较高的)容量教师网络中更好地学习(硬)数据样本。具体而言,我们的方法是一种基于TA的逐渐的KD技术,它每个输入图像选择单个教师,该课程是基于通过对图像进行分类的难度驱动的课程的。在这项工作中,我们凭经验验证了我们的假设,并对CIFAR-10,CIFAR-100,CINIC-10和Imagenet数据集进行了严格的实验,并在类似VGG的模型,Resnets和WideresNets架构上显示出提高的准确性。
translated by 谷歌翻译
本文介绍了Cool-MC,这是一种集成了最先进的加固学习(RL)和模型检查的工具。具体而言,该工具建立在OpenAI健身房和概率模型检查器风暴上。COOL-MC提供以下功能:(1)模拟器在OpenAI体育馆训练RL政策,用于Markov决策过程(MDPS),这些模拟器定义为暴风雨的输入,(2)使用“ SORM”的新型号构建器,用于使用回调功能要验证(神经网络)RL策略,(3)与OpenAI Gym或Storm中指定的模型和政策相关的正式抽象,以及(4)算法以获得有关所谓允许政策的性能的界限。我们描述了Cool-MC的组件和体系结构,并在多个基准环境中演示了其功能。
translated by 谷歌翻译